Network Structure Revealed by Short Cycles 
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This article explores the relationship between communities and short cycles in complex networks, 
based on the fact that nodes more densely connected amongst one another are more likely to be 
linked through short cycles. By identifying combinations of 3-, 4- and 5-edge-cycles, a subnetwork 
is obtained which contains only those nodes and links belonging to such cycles, which can then 
be used to highlight community structure. Examples are shown using a theoretical model (Sznajd 
networks) and a real- world network (NCAA football). 
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I. INTRODUCTION 

Complex networks have attracted growing attention 
because of their non-uniform connectivity patterns, 
which may give rise to node degree power laws and 
hubs, known to play an important role in defining sev- 
eral topological properties of the networks [l|, [2j, [3J . More 
recently, the fact that many complex networks include 
communities, i.e. sets of nodes which connect more in- 
tensely amongst one another than with the rest of the 
network,has become the focus of increasing attention 
(e.g. @, IE IE 0, 1, ©, M, EJ HI d). Indeed, because 
of statistical fluctuations, even random networks [3, [H[ 
can be found to exhibit communities 

[IEG3- Although 

we still lack a clear-cut definition of a community, the 
problem of identifying communities in complex networks 
continues to motivate interest from researchers because 
of the importance that those structures have for better 
understanding the general organization of such complex 
structures (e.g. [Hj]). 

Another important feature of complex networks are the 
cycles of different lengths which underlie the connectivity 
of the several models of networks 19]. Actually, the sta- 
tistical distribution of cycles has been acknowledged as 
particularly important for defining not only the topology 
of the respective networks, but also the dynamics of sys- 
tems running on such frameworks(e.g. [201]). The latter 
is a direct consequence of the fact that cycles, through 
feedback, form the scaffolding of memory in dynamical 
systems. 

Generally, the density of cycles tends to increase as 
more edges are incorporated into a network, with longer 
cycles being observed earlier than shorter ones (e.g. [211]). 
Therefore, the density of cycles of different lengths can 
be used as an indicator of the connectivity between any 
subset of nodes. In other words, the larger the num- 



ber of shortest cycles among a subset of nodes, the more 
connected such nodes are to one another. Longer cycles 
tend to grow, "coiled up", alongside these shorter cycles, 
however, blurring the distinction between nodes based 
solely on short-cycle participation. We present methods 
to overcome this. 

The article starts by presenting the cycle finding algo- 
rithm and its application as the core of the community 
finding algorithm and proceeds by illustrating the appli- 
cation of such a methodology to community finding in 
a theoretical complex network model (i.e. Sznajd net- 
works |22|) and a real- world football network. 



II. DESCRIBING SHORT CYCLES 

For a graph G = {V, E}, n — \V\,m = \E\, we are 
interested in finding cycles of length 3,4, or 5 containing 
some starting vertex v £ V. To describe these cycles we 
begin by decomposing G into shells S$ about v. We define 
shell Si to be the set of all vertices (and edges between 
those vertices) at a distance i from the starting vertex v. 
Since we are only interested in cycles of length < 5, we 
need only to keep shells Si and S 2 - 

It is simple to describe all possible short cycles using 
these shell decompositions. For example, for every edge 
eij in Si about v, there exists a 3-cycle (triangle) v-i-j- 
v. Similarly, for every path of length 2 or 3 in Si, there 
exists a 4- or 5-cycle, respectively. Another 4-cycle and 
two more 5-cycles exist involving both Si and S2. 

In general, for a cycle of length L > 3, the number 
of such possible "cases" grows with L. Since it requires 
2 edges to visit a shell, an L-cycle can visit at most J 
shells, where 



J = 



L even. 



£=1, L odd. 



(1) 



'Electronic address: bagrowjp@clarkson.edu 
t Electronic address: bolltem@clarkson.edu 
* Electronic address: luciano@if.s c.usp.br| 



If the farthest shell the cycle visits is Sj , j < J, there 
are at most L — 2j remaining edges that must be dis- 
tributed between and within the Si, S2, ■■■Sj shells. The 
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number of ways to distribute L — 2j edges over j shells is 
(L-^yi^-iy. ■ However, it is possible for a cycle to "zig- 
zag" between shells, using more than the 2j edges neces- 
sary to visit the j shells. Therefore, the total number of 
possible ways to distribute an L-cycle is at least: 



Ni{L) = 1+ 
J J-j ,. . . 



EE 



2)! (L-2i-j-l)! 



i\(j-2)\ (L-2j-2i)\(j-l)V 



(2) 



with the outer sum accounting for all the possible shells 
the cycle can visit, the inner sum for all the optional pairs 
of edges that can lie between shells and the +1 for the 
one possible cycle that visits the first shell only. Here, 
i is the number of pairs of edges between shells beyond 
the j necessary to visit the j shells. 

This calculation fails to take into account permutations 
of the ordering of edges between and within two adjacent 
shells. A simple upper bound is possible, however, as 
there are certainly no more than L\ possible permutations 
over the whole network: 

N U (L) = 1+ 

\- 2:1 (X-2i-j-l)! 
Z^Z^~[Ta — —— — 7Ta — — TTT^ 1 ' w 



with 



l -^- 2 ) [ (L-2j-2i)\(j-l)\ 



Ni(L) < N(L) < N U {L). 



is the graph G containing only edges that do not partic- 
ipate in j-cycles. Separate communities in G will appear 
as disconnected components in H. We interpret vertices 
with degree zero in H as communities of size one. 

In specifying H, the question of what to choose for 
j has been left open. For example, choosing just j = 
{3} will correspond to deleting all edges from G that 
participate in 3-cycles, generally not a useful result. One 
may consider j to be a tunable parameter, used to get a 
desired result when applied to a specific network. 

One issue that can occur is that longer cycles often 
overlap shorter cycles. In terms of communities, most 
inter-community edges contain few (if any) short cycles, 
but intra-community edges tend to contain both long and 
short cycles, since a long cycle can "coil" inside the com- 
munity. If one were to just delete all 5-cycles in a graph, 
it is very possible to end up deleting all edges. 

There is quite a bit of leeway in how we choose j and 
build H , and we can use this to our advantage. For ex- 
ample, pick two cycle lengths s and t,s<t and compute 
C s and Ct- Then, build another set of edges, C t \ s 



a 



t\s 



c t \c s 



(8) 



(4) 



containing the set of edges that participate in i-cycles 
but not s-cycles. The graph H = {V,C t \ s } will con- 
tain edges that tend to be between communities and not 
within, for an appropriate choice of t and s. One can 
think of this as a "backbone" of the network, and delet- 
ing these edges may be a useful pre-processing step for 
applying other community-detection algorithms, includ- 
ing betweenness [3, [l(| ■ 



III. CYCLES AND COMMUNITIES 

For a graph G, a cycle C is a subset of the set of 
edges E containing a continuous path, where the first 
and last node of the path are the same 23]. Permuta- 
tions of cycles may be ignored since we will be working 
exclusively with sets of edges. Throughout this work, we 
limit ourselves to short cycles, typically those of length 
I, 3 < / < 6. These shorter cycles may provide the ad- 
vantage of faster calculation times. 

Community structure can be studied by comparing the 
edges covered by these cycles with the original graph. Let 

Ci(i) = the set of edges traversed by all (5) 
/-cycles starting from vertex i 

Starting from all vertices and limiting ourselves to only 
short j-cycles [291 ] . 

c = uu^-w- ( fi ) 

iev j 

Then, for a graph G, we construct a graph H where, 
H = {V,E\C} (7) 



IV. APPLICATION EXAMPLES 

We present example applications of the methods pre- 
sented in Section IIIII to two networks: a network of 
NCAA Division I-A football games held during the 2005 
regular season [3(| and a Sznajd network [24[. In addi- 
tion, we discuss how these methods can break down and 
ways to overcome that. 



A. Football Network 

In NCAA football, teams are grouped into conferences 
based on location. To save on transportation time and 
cost, more games are played between teams in the same 
conference than in different conferences. Thus, a graph of 
the game schedule, where nodes are teams and edges con- 
nect teams that have played against each other, naturally 
exhibits community structure based on these conferences 

Figure [la] displays the original network, call it G. As 
a first pass, let's use j = {3} and generate G3 = {V, C}, 
pictured in Figure llbl using the same layout as Hal This 
deletes all edges that do not participate in 3-cycles. 
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Most deleted edges are between conferences, though some 
edges remain. This will not split the network into seper- 
ate components based on the communities but it may be 
useful as a preprocessing step for betweenness or another 
community detection algorithm. 

In addition, let us build C t \ s , as per Equation [8] For 
this network, we have chosen t = 5, s = 3. Figure [Tel 
shows £x5\3 = {V,C t \ s }, again using the same layout as 
[Tal For improved clarity, Figure [Id] shows G$\3 with 
a layout emphasizing that all edges are between confer- 
ences. 

We propose that edges in C$\3 comprise the majority of 
this network's inter-community structure. To test this, 
one can compare the distributions of edge betweenness 
for these backbone and non-backbone edges, as shown 
in Figure I2al Backbone edges tend to carry much higher 
betweenness values than the more common non-backbone 
edges. 

B. Sznajd Network 

One particularly interesting category of com plex net- 
works are the so-called geographical models (e.g. [27l.l28|). 
whose nodes have well-defined positions in an embedding 
metric space S. Typically, the connectivity in such net- 
works is affected by the adjacency and/or the distance 
between pairs of nodes, with nodes which are closer one 
another having higher probability of being connected. As 
an immediate consequence of such an organizing princi- 
ple, communities in traditional geographical communites 
are closely related to the presence of spatial clusters of 
nodes, i.e. groups of nodes which are closer one another 
than with the rest of the network. Introduced recently, 
the family of geographical networks known as Sznajd net- 
works |22| allow rich community structure as a conse- 
quence of running the Sznajd opinion formation dynam- 
ics [24| among the network edges instead of considering 
the states associated to each network node. Starting with 
a traditional geographical network (called the underly- 
ing network T) where the connections are defined with 
probability proportional to the distances between pairs 
of nodes, a percentage of edges of T are removed, yield- 
ing the initial condition for the Sznajd dynamics. Then, 
edges from T are chosen randomly and used to influence 
the respective surrounding connectivity. For instance, in 
case the chosen edge (i,j) is on (i.e. it does correspond 
to a link in the current growth stage), the edges in T 
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which are connected to the nodes i and j are established 
with probability p. An analogue procedure is considered 
with respect to edges which are absent. In order to avoid 
convergence to the trivial ground states where all edges 
are set on or off, the dynamics also consider as feedback 
the total number of established edges. 

Figure l3"al shows a Sznajd Network. Edges that do not 
participate in 3-cycles are indicated. As can be seen, 
many of these edges fall "outside" of the more dense re- 
gions of the network. This is a good first pass, and may 
be used to initialize another algorithm, similar to our 
football result, but it will not give detailed information 
on the hierarchical community structure. 

Figure [3bl shows the same network as|3al but with the 
edges of C 5 \ 3 highlighted. One can imagine removing 
both the C3 and Cs\3 edges to further enhance the sep- 
aration. 



V. CONCLUDING REMARKS 

The identification and characterization of the commu- 
nities present in complex networks stands out as one of 
the most important approaches for understanding their 
structure and possible formation and evolution. At the 
same time, the distribution of cycles of various lengths 
in a complex network has important implications for the 
connectivity, resilience and dynamics of the respectively 
studied networks. The current work brought together 
these two important trends, in the sense of applying 
short cycle detection as the means to help the identifica- 
tion of communities in complex networks. The suggested 
methodology has been applied with promising results to 
the identification of communities in a theoretical network 
model, more specifically a Sznajd geometrical networks, 
as well as to a real- world network (NCAA) . 

The relationship between the cycles and communities 
in the football network has been further investigated in 
terms of the betweeness centrality measurement, confirm- 
ing that the obtained backbone edges tend to exhibit 
higher betweeness values. 
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FIG. 1: (color online) The NCAA Div I-A 2005 regular season with all edges (a) with 3-cycles only |(b)| and with 
just C 5 \ 3 edges |(c)| |(d)| is the same graph as |(c)| but with a layout emphasizing that no edges within conferences 

remain (degree zero nodes omitted). As per [261], the conferences are: A = Atlantic Coast, B = Big 12, C = 
Conference USA, E = Big East, I = Independent, M = Mid- American, P = Pacific Ten, S = Southeastern, T = 
Western Athletic, U = Sun Belt, W = Mountain West, X = Big Ten. 
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FIG. 2: (color online) Histogram of edge betweenness for non-backbone edges (red) and backbone edges (blue) for 
the NCAA 2005 football network | (a) | and the Sznajd network shown in Figure [3] (b) For the football network, 



the 

mean (unnormalized) betweenness is 42.8 for non-backbone edges and 132.9 for backbone edges. Note that backbone 
and non-backbone histograms use the same bins; the front-most bins have been narrowed for clarity. The Sznajd 
non-backbone bins have also been scaled down by a factor of 25 for clarity. 





FIG. 3: A Sznajd network. Edges that do not participate in 3-cycles are dashed [(a)] Edges in C 5 \ 3 are bold |(b)| 

Note that nodes of degree zero have been omitted for clarity. 
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